Picture for Yuzhang Shang

Yuzhang Shang

V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval

Add code
Feb 05, 2026
Viaarxiv icon

Real-Time Robot Execution with Masked Action Chunking

Add code
Jan 27, 2026
Viaarxiv icon

SilentDrift: Exploiting Action Chunking for Stealthy Backdoor Attacks on Vision-Language-Action Models

Add code
Jan 20, 2026
Viaarxiv icon

Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation

Add code
Jan 15, 2026
Viaarxiv icon

PackCache: A Training-Free Acceleration Method for Unified Autoregressive Video Generation via Compact KV-Cache

Add code
Jan 07, 2026
Viaarxiv icon

AdaTooler-V: Adaptive Tool-Use for Images and Videos

Add code
Dec 19, 2025
Figure 1 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Figure 2 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Figure 3 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Figure 4 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Viaarxiv icon

Distill Video Datasets into Images

Add code
Dec 16, 2025
Figure 1 for Distill Video Datasets into Images
Figure 2 for Distill Video Datasets into Images
Figure 3 for Distill Video Datasets into Images
Figure 4 for Distill Video Datasets into Images
Viaarxiv icon

Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

Add code
Nov 17, 2025
Figure 1 for Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
Figure 2 for Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
Figure 3 for Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
Figure 4 for Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
Viaarxiv icon

Efficient Multimodal Dataset Distillation via Generative Models

Add code
Sep 18, 2025
Viaarxiv icon

ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms

Add code
Sep 11, 2025
Viaarxiv icon